Experimental Methods in
Cognitive Psychology
Michael Dougherty
University of Maryland
Reproduction of this material is prohibited without authorization by the author.
The Science of Psychology
Once upon a time I was a skeptic of the idea
that psychology was a science. It was my lay impression that psychology dealt
with people, and was mostly concerned with how to make people feel good or to
cope, or whatever. Although I soon came to realize that psychology was a
science, I still had the impression that psychology was not a full-fledged
science: it was too different from biology, chemistry, and physics to be
considered in the same breadth as those disciplines. After all, biology,
chemistry and physics are hard sciences, and psychology did not seem to
fit with the hard sciences. I now, of course, see why I didn’t view psychology
as a fully-developed science. I was naďve, I didn’t really understand what
psychology was about, and I certainly didn’t really understand what made
something a science. I now view psychology every much a science as physics.
Indeed, the idea that psychology is not as hard a science as physics and
chemistry is even upsetting to me.
So what changed in my thinking over the
years? How did I come to view psychology as a “hard” science? Part of my
problem was that I let my naivety get the best of me. I knew physics and
chemistry and biology were sciences, so I let myself define science by
discipline: Certain disciplines were sciences and others simply weren’t. But obviously, defining science by
discipline does little to help us recognize new sciences as they emerge.
Perhaps we are inclined to use the goals of science to help us define science:
sciences are those disciplines that deal with facts, or that have the goal of
uncovering facts of nature. Or, perhaps sciences are those disciplines that
include elaborate theories. However, if we rely solely on these criteria, we
would be left including history and/or music among the sciences. Certainly,
history and music are not to be considered sciences, so what sets them apart
from the true scientific disciplines?
If there is one thing we can point to that
can help us identify scientific disciplines it would be the process by which we
search for facts. Sure, all sciences are concerned with facts, but it is the
process by which these scientific disciplines search for facts that sets them
apart from non-scientific disciplines, such as history, that search for facts.
Modern sciences are remarkably similar with respect to the process of
discovering facts – a process that we refer to as the scientific method.
We usually characterize the scientific method
as consisting of four stages:
Take a few moments to think about how you would decide what constitutes a dyad, and what constitutes a door hold. Write them down here and compare them with your classmates.
Observation.
Once the hypothesis is formed, and our
variables operationalized, we must test the hypothesis – that is, we need to
make observations. Observations are done using formal methodologies, which will
be introduced in the next chapter. Methodologies can be divided into four
general classes: Controlled (or True) experiments, Quasi-experiments, survey,
and naturalistic observation. Controlled or true experiments are those
observations that take place in the lab under strict laboratory controllers –
they are perhaps what we think of when we think of science. The hallmark of a
true experiment is that participants are randomly assigned to conditions.
Quasi-experimental designs involve the comparison between naturally
occurring variables. For example, any comparison between males and females,
between different ethnic groups, or between different age groups, are
quasi-experimental because people cannot be randomly assigned to these naturally
occurring groups.
Surveys are observations that are made to discover or explore the relationship
among variables. Surveys are often quasi-experimental, in that the data that
are obtained are frequently correlational data.
Naturalistic observations are observations that we make of the
phenomenon in its’ naturally occurring environment.
All four methods serve the purpose of
formalizing how we make our observations, but they don’t all allow the same
type of conclusions. For example, inferences about causation can only be made
using a true experiment. Naturalistic observations don’t allow us to infer
causation but do allow us to explore the phenomena of interest in the setting
it naturally takes place. Surveys are most often used to identify correlational
relationships, and most frequently do not allow causal inference.
Replicability.
Replication is as important to the scientific
process as theory and observation. After all, we can’t build theory without
being confident that the results we obtain in an experiment are reliable. So we
replicate. Replication can achieve multiple goals. The first goal it achieves is that it increases our confidence
that the effect we observed is in fact real. Suppose we run the same experiment
100 times. Out of this 100 times, we obtained an effect exactly 5 times. How
much confidence would you have in this experiment? Probably not much and
certainly not as much as an experiment that replicates 99 out of 100 times.
Note that finding an effect five times out of 100 chances is exactly what we
would expect if there were no real effect present, and we use a significance
value of a = .05 in our statistical tests. That is,
inherent statistical properties present in the world will ensure that we
erroneously observe an effect of our variable some of the time, even if the
true state of affairs is that there is no effect.
Remember that thing about type I error rate
you learned in your statistics class? Well, replication helps keep us from
committing type I errors – that is, it keeps us from falsely concluding that a null
hypothesis false, when it is actually true. In this sense, replication is one
way of protecting ourselves against violations of a second type of validity – statistical
conclusions validity.
A second goal of replication is to test the
robustness or generality of the effect. Does the variable of interest have an
effect across a variety of populations? Do we find the same effect with the
elderly as we do with college students? Does the variable have an effect when we
modify the task? All these questions are answered through replication. Note
that answer to these questions address yet a third type of validity – external
validity.
Develop law or theory.
Developing a law or theory is both the most
challenging and the most enjoyable part of science. It’s challenging because
the theory must be specific enough to provide specific empirical predictions,
yet be general enough to summarize existing datasets. It’s the most enjoyable
because theories are our lens through which we view the world. Without
theories, we would be left explaining the world with the very observations
we’re trying to explain.
Take for example serial position effect in
free recall. In the typical serial position experiment, participants study a
list of say 20 words. The words are presented one at a time to participants at
a constant rate (e.g., each word is presented for 4 seconds). In one variant of
this task, participants are asked to recall as many words as they can
immediately after the last word is presented. The recall task merely requires
participants to output the words, regardless of order. This task is repeated
several times for each participant. Note that some words are presented towards
the beginning of the list, some towards the middle, and others just prior to
participants being asked to recall. In examining serial position effects, the
experimenter is interested in knowing: Of those words that were presented first
in the serial order, how many were recalled? Of those words that were presented
2nd, how many were recalled? Etc. By tabulating the results this
way, we can get a sense of how well participants are able to recall as a
function of the serial position of the words on the list.
Typical serial position effects are shown in
the figure below. The y-axis plots the mean percent correct for each serial
position. The x-axis plots the serial position.
Two things should be apparent from this
graph. First, recall is never perfect. Second, words at the beginning and end
of the list show better recall than words in the middle serial positions. The
enhanced recall for the beginning of the list is referred to as a primacy
effect and the enhanced recall for the end of the list is referred to as a
recency effect.
It is important to distinguish between
effects and theories. Effects are descriptive labels we give to empirical
phenomena. Theories are explanations of why the effects obtained. If I asked to
you to give a theoretical explanation of serial position effects, you might
start by telling me that recall is better for words at the beginning and the
end of a ordered list relative to the middle. This is an accurate description
of the data. Primacy and recency are not theoretical accounts of this
description; they are merely labels that we use to summarize the empirical
finding. The theoretical account is directed at answering the question “Why is
recall better for words at the beginning and end of an ordered list?” Or, “Why
do primacy and recency effects obtain?”
Obviously, to account primacy and recency
effects we need to postulate a system that can plausibly produce these effects.
As we will discuss later in this course, one such theory is the modal model.
This theory postulates two different memory stores, a short-term memory (STM)
and a long-term memory (LTM), along with a few assumptions about what these
stores are responsible. Not only does this theory provide an account of primacy
and recency, but it also provides new predictions that can be tested to further
substantiate the theory.
It’s important to point out that theories are
nothing more than explanations that explain our observations. Ideally, these
theories are parsimonious – they are as simple as possible. Why should we
prefer simple explanations to complex explanations? One reason is that we want
to make a few assumptions as necessary to account for the data. A second reason
is that simple explanations are easier to construct, and easier to understand.
That’s not to say that we shouldn’t construct complex theories, but merely to
say that we should start as simple as possible. As a rule, we start simple and
add to our theory only when the data necessitate it. Parsimony is one property
of a good theory.
The second property of a good theory is that
it can be tested, and potentially disconfirmed. That is, if a theory makes such
a wide range of predictions that it can account for any possible result, it
can’t be tested. No matter how the experiment is designed, the outcome will be
consistent with the theory, regardless of whether the theory is actually
correct. When theories do not allow for the possibility of being incorrect
(i.e., they predict any conceivable result), they are said to lack
falsifiability.
Falsifiability relates to two philosophical
views of theory testing: Confirmationism and falsificationism. Confirmationism
states that theories gain credence when experiments confirm their predictions.
Thus, the more experiments that are done that confirm a theory, the more
confidence we can hold in the accuracy of the theory.
Confirmationism, however, only provides
evidence consistent with the theory that is being evaluated and tells us
nothing about the space of possible alternative theories. As an example,
suppose there are three theories that logically account for a single
experimental result. Suppose we now run a new experiment, and the results again
confirm all three theories. Should our confidence in any particular theory
increase as a function of this experiment. No! However, suppose that we instead
design an experiment that, if it comes out one way, confirms one theory and
disconfirms another. In this case, we are justified in increasing our
confidence in the theory that is not disconfirmed.
The idea of designing experiments that can
disconfirm a theory stems from falsificationism. The basic premise of
falsificationism is that confidence in any particular theory should increase by
virtue of eliminating alternatives and by setting up experiments that, if they
come out one way will disconfirm the theory.
If we cannot design an experiment that can
distinguish between competing theories, the competitors would be said to lack identifiability.
To return to the topic of properties of good theories, our second (and necessary) property is that: Good theories are coherent, both logically and empirically. That is, theories should make sense logically – they should not be able to make two contradictory predictions. Moreover, good theories should allow for the possibility that they will be disconfirmed – can an experiment be designed that could potentially produce results outside the boundaries of the theory’s predictions? This does not mean that data will ever actually be observed that will disconfirm the theory, just that the theory’s range of predictions are not so wide that it can account for every plausible result.
Theories that satisfy logical coherence are good because they are falsifiable – they make predictions that, if disconfirmed, would render the theory incorrect or incomplete. Falsifiability is an important property of a theory. If the theory can account for several contradictory empirical findings simultaneously, then it may as well not make any predictions at all. That is, it a theory can account for all possible outcomes of all possible experiments, it lacks falsifiability. If it lacks falsifiability, then it cannot be tested: What does it mean to confirm a theory if the outcome of the experiment is predicted by the theory regardless of how the experiment comes out?
A third property of good theories is that
they both explain and predict. We want theories that can explain all of our
past observations, but we also want to be able to use our theories to make new
predictions that can be tested empirically. The ability to make new, testable
predictions means that there is the potential for the theory to be falsified.
If the new predictions cannot be supported empirically, we can conclude that
the theory is somehow flawed, and is in need of modification (or even
abandonment). If the new predictions are supported empirically, then the theory
gains momentum. Theories that survive repeated attempts to be falsified may
eventually be accepted as law.
Think about how the modal model might be able
to predict new experimental results.
Assumptions of Science
All scientific endeavors make at least four
assumptions. One assumption is that the world and nature are lawful –
that is, events are ordered and predictable. In the absence of lawfulness,
events would be random, and experimentation impossible (or at least
unfruitful).
The second assumption is that the world is a deterministic
place. Events in the world and nature have causes. Things happen as the result
of a causal force. In physics and chemistry, true determinism is possible. That
is, every time A occurs, it causes B. However, we need not be so rigid in how
we think of determinism. We can think about determinism in the probabilistic
sense: If A, then the likelihood of B increases. In fact, probabilistic
determinism is the norm in most sciences, especially in the behavioral and
medical sciences. For example, we all accept the fact that smoking causes
cancer. However, the likelihood that one develops cancer as a result of smoking
is less 100%. That is, there is a probabilistic relationship between smoking and
cancer. What we’re interested in is whether variable A increases or decreases
the likelihood of variable B.
The third assumption that we make is that empiricism
is possible. Lawfulness and determinism do us little good if we cannot observe
or measure the phenomena of interest.
The fourth assumption is that of parsimony.
The assumption of parsimony simply states that we prefer simple explanations or
theories to complex explanations. If a simpler explanation of the phenomenon is
sufficient to account for the data, then it is preferred to the less simple
explanation. Parsimony allows easy extrapolations of our findings.
Measurement in Science
Measurement is paramount to science. We cannot study phenomena that cannot be measured. One can think of two general classifications of measurement: Direct and indirect. Direct measurement involves being able to directly observe, through our senses, the phenomena to be observed. For example, with a powerful enough telescope, Copernicus could directly measure the motion of the planets in our solar system. Chemists can measure the temperature and speed of chemical reasons and the change in molecular structure and mass. Astronomers often rely on pictures, and samples, taken by satellites or probes (such as the Mars Observer). And, for a time, during the Behaviorist movement, American psychology relied almost exclusively on direct measurement to assess and describe behavior.
However, direct measurement cannot be used in all observations – when direct measurement isn’t possible, we look for indirect measurements. Indirect measurements are proxies for direct measurements – they tap the residue leftover from the phenomena. Sometimes we can’t directly observe the phenomena of interest, but we can observe its by-products.
Case in point:
the last several years witnessed a boon for astronomy, and the discovery of new
planets. These new discoveries have been made, not by directly spotting the
planets through light or radio telescopes (many of these new planets are far
too small to be seen though a light telescope), but by identifying their
“signatures”. Radio telescopes work my measuring the residue left behind by
physical objects using x-rays. X-rays
bounce off and permeate different elements, and compounds in different ways.
(e.g., X-rays can tell the difference between bone and soft tissue, but can
also tell us the difference between hydrogen gas and nitrogen gas). Relying on
rather simple laws of physics, it is possible to identify a new planet by
looking for its trace. When a massive object circles a star, its
produces gravitational tugs and pulls on the star’s gaseous surface. All this
pulling and tugging produces a slight wobble in the stars’ surface. We don’t
see the planet itself, but we do see the residual effects of the planet
– its effect on the gaseous surface of the star. Thus, by measuring the extent
of the gravitational pull and the regularities (or lawfulness) of the wobble,
astronomers can calculate: a) whether a planet is present, b) its approximate
mass, and c) its orbit.
A lot of what we
want to observe in psychology is not directly observable. We can’t peer into
the minds of our participants to see what or how they are thinking. We can,
however, measure the residual effects of their mental processes. We can’t see
what happens to the memory trace as a result of having you use an elaborative
rehearsal strategy. But we can measure
the residual effects of elaborative rehearsal: that which is left over as a
result of elaborative rehearsal. We will discuss at length paradigms in which
we can learn about the properties and functioning of the mind by tapping into
the by-products of the mental functions.
A good example of the use of indirect measurement in cognitive psychology is illustrated by a paradigm developed by George Sperling in the 1960’s. Sperling was interested in the capacity of what we call Iconic memory – how much information the visual system can hold. In cognitive psychology, we typically think of at least three types of memory: Very short-term sensory memory, short-term/working memory, and long-term memory. Sensory memory is assumed to last no more than 1-2 seconds, it’s that very brief interval in which you first perceive information before you become conscious of it. In the case of iconic memory, one might think of it as visual persistence – the visual image that is retained on the retina after sensing an object. Short-term memory is typically thought of as lasting no longer than 30 seconds, and long-term memory is your relatively permanent record of past events. One can think of short-term memory as that information that you are current consciously aware of.
In Sperling’s case, he was interested in the
capacity of iconic memory – how much information the visual system could code.
However, he faced a special problem in investigating this phenomenon. How do we
measure how much information can be held in sensory memory? In his initial
attempts, Sperling developed what is now called the full-report procedure.
Participants are flashed a three x four array of letters and then required to
report the letters (after a delay of 0-5 seconds) in that array verbally. A typical trial of this procedure is given
below:
Fixation 0 (500 msec) C F P Y J M B X S G R L (50 Msec) (Blank) (0 – 5 sec) (Blank) Report letters
The results from this procedure can give us two types of data. First, we can examine how much information is held in iconic memory by the number of items reported. Second, we can examine how long that information is retained in iconic memory. For example, if participants can now longer report the contents of the array after a few seconds, this would suggest that the contents of iconic memory had faded. Results using the full-report procedure revealed that participants could accurately report around 37% of the letters in the array, suggesting that the capacity of iconic memory was limited to 4.5 items. In addition, accuracy decreased markedly after only a few seconds, suggesting that information remains active in iconic memory for only a very brief period of time.
· Can you think of why the Full report procedure might give us an inaccurate estimate of the capacity of sensory memory? To answer this question, it might help to consider the methodology in the context of our assumption of the duration of sensory memory.
Note that the full report has a built in confound. If the duration of iconic memory is really only a few seconds, and participants have to report verbally the content of their memory, and the reporting of the letters takes time, then it is possible that participants’ iconic memory is holding more than they can report. For example, suppose that it takes 500 msec to report each letter. By the time participants report 4 letters, 2 seconds have elapsed. Thus, by the time participants report 4 letters, anything else that may have been encoded in iconic memory would have decayed. Perhaps iconic memory has a larger capacity than can be measured using the full-report procedure.
To address this potential concern, Sperling developed a second method, called the partial report method. The same basic paradigm was used, except that prior to being asked to report, participants were instructed on which row was to be reported. The instruction was really just an auditory tone: High pitched tones signaled the top row, medium pitched tones the middle row, and low pitched tones the bottom row. Note that the instruction to retrieve a particular row will not affect what participants try to study in the study array since they do not know which row they will have to report until just prior to the report.
0 (500 msec) C F P Y J M B X S G R L (50 Msec) (Blank) (0 – 5 sec) Tone cue for report (Blank) Report letters
Using this new procedure, Sperling revealed that participants could retrieve 3-4 letters (or 76%) from each row. By extension, then, Iconic memory must be able to store around 9 – 10 letters, or 75% of the 12 letter display. The ultimate conclusion of this line of research was based on the results of the partial report procedure.
What do the full and partial report procedures tell us about measurement in cognitive psychology? First, we are inferring capacity using an indirect measure – the number of items that can be retrieved. Note Second, we are inferring duration using an indirect measure – how long it takes for those items to decay from iconic memory. Finally, we see that the methodology that is used in the measurement process affects our conclusions. If Sperling were to have not developed the partial report procedure, the conclusions would have been dramatically different. Note that the methodology we used to measure of the underlying construct (capacity of iconic memory) is tied to our theory. Our measurements are no better than our methods – if our methods are tainted or flawed, our measurement is tainted or flawed, and as a result, our scientific conclusions may be inaccurate.
Observation through simulation.
Sometimes the
systems we want to observe are not easily observed as a whole, so we develop
Simulation models. Computer simulation models allow us to predict events or
pattern of events by instantiating certain assumptions within the context of a
computer program. Simulations of ocean
currents use assumptions based on topographical data and sea temperatures.
Simulations of tidal waves given underlying topographical data, fault lines in
seafloor, assumptions about the force of certain types of underwater landslides
and the effect of these landslides on water movement (again using empirical
data to derive assumptions).
One of my favorite simulation models was
constructed to describe the behavior of bubbles in beer:
Example: Why do some
bubbles in Guinness Stout sink rather than rise? Simulations of Guinness Stout
(a group of scientists at the University of New South Wales in Sydney,
Australia) reveal that it has to do with the size of the bubbles. Bubbles
smaller than 50 – 60 nanometers in size have too little buoyancy and momentum
to resist the downward currents. Apparently it also has something to do with
the shape of the glass in which the beer is poured: The barrel-chested Guinness
glass is the best for demonstrating the effect. The simulation model is based on fluid dynamics. No actual observations
were made, just a simulation.
Simulation models serve several purposes, but the most important among them is that they enable us to explore the expected relationship among variables as if our theory were correct – they give us a way to observe the behavior without really observing behavior. Thus, simulation models can be an enormous help in guiding experimentation. In this sense, simulation models serve a heuristic value, as our choice of what experiments to develop can be informed by our theory.
The realism of models. Models by their very nature are simplifications of the real world. Just like a model airplane is simplification of a real airplane, scientific models are simplified representations of the world. Thus, there is nothing real about a model at all, except that it is intended to capture the essence of some aspect of the real world. This is an important point to keep in mind as you encounter models of cognition in this class. Regardless of their apparent sophistication, all models are false because they are simplified versions of the world they intend to represent! Mathematically specified models are no more realistic than non-mathematical models. Moreover, just because a model provides good fits to the data is not, in itself, a good reason to believe the model is correct. Indeed, a model can be fundamentally incorrect, yet fit data well.
Summary.
You fill in
the blanks here.